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IRT Linking Procedures 

Abstract 

In many practical applications of item response theory, the 
parameters of overlapping subsets of test items are estimated from 
different samples of examinees. A linking procedure is then 
employed to place the resulting item parameter estimates onto a 
common scale. It is standard practice to ignore the uncertainty 
associated with the linking step when drawing inferences that 
involve items from different subsets, a situation that arises, for 
example, in the measurement of change. This paper outlines how 
the uncertainty can be accounted for, and exemplifies the ideas 
with a jackknife approximation for the Stocking- Lord linking 
procedure. Examples from the National Assessment of Educational 
Progress suggest that the resulting uncertainty will usually be 
negligible for inferences about individuals, but can constitute a 
major source of estimation error in aggregate statistics such as 
changes in group means . 
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1 . 0 Introduction 



A widely cited advantage of item response theory (IRT) in 
educational measurement is its capability to provide proficiency 
estimates on a common scale when different examinees are 
administered different items, or when examinees are administered 
different items at different points in time. A common practice is 
to estimate the parameters of a large niomber of test items, treat 
the estimates as known true parameters, and calculate proficiency 
estimates for individuals or groups based on responses to selected 
subsets of items. Practical considerations often preclude 
administering all items to a single sample of examinees in order 
to obtain the initial item parameter estimates; rather, estimates 
for overlapping sets of items are obtained from separate samples 
of examinees, then linked to a common scale. While it is 
generally recognized that the parameters of the required linking 
functions used in practice are estimates rather than known 
constants, the effects of the uncertainty associated with them 
upon subsequent analyses are rarely taken into account. 

This paper lay 5 out a framework for incorporating the 
uncertainty assoc ic ted with IRT 1 inking procedures in subsequent 
estimates of individual or group change. The ideas are 
implemented for the linking procedure given by Stocking and Lord 
(1983), and illustrated with data from the 1984 and 1986 reading 
surveys of the National Assessment of Educational Progress . 
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2.0 The 3«Pararaerer Logistic Item Response Model 



The 3PL model expresses the probability of a correct response 

to an item as a function of (i) the examinee's proficiency level 

6 ., and (ii) three parameters characterizing the item, 

6 «=(a .b..c.) for 1=1,..., n. The parameter a., called the 

j J J J 

discrimination or slope parameter, characterizes the item's 

sensitivity to proficiency. The parameter b^ , called the 

threshold parameter, is a measure of item difficulty. The 

parameter c^ is the probability that an individual with very low 

proficiency will respond correctly to the item. The conditional 

probability of a correct response to any single item, denoted 

P.(^.), is obtained as 
J ^ 



= Cj + (l-Cj)/{l+exp[~1.7aj(^^-bj) ] ) , (1) 

where the item response x^^ = 1 if correct and 0 if not. Under 
the usual assumption of local or conditional independence, the 
probability of a vector of observed item responses, = 

(x.^,...x. ), given a known proficiency value 9 ^, can be expressed 
as a product over items as follows 



P(x. 1^?. ,/9) 



n 

- n P(x, .=1 1 ^. 



X. . 



1 

5.)) 



X 



i-j 



4 



b 



( 2 ) 



n X. . 1-x, . 

= n P.(9.) ^ 

• 1 J J- J i 

j=l J 



Because P.(6.) is defined as a function of a.(5.-b.)> the 
J 1 J 1 J 

origin and unit of measurement of the proficiency metric are 

undetermined. That is, for any rescaling constants A and B, if 
Vt "A- - 1 

0 , ^ A 0 ,+ B , b. =Ab, + B and a . « A a . , 

1 1 J j J 3 

then ^ ** ^ unchanged. 

Since any such linear transformation of the scale retains the 
meaning and the implications of all parameter values, the unit- 
size and origin of the 0 scale must be determined arbitrarily by 



the researcher. 

Two widely used procedures for estimating the item parameters 
of n items under the 3PL model are: joint maximum 
likelihood, the approach incorporated in the LOGIST program 
(Wingersky, Barton, and Lord, 1982); and marginal maximum 
likelihood, the approach incorporated in the BILOG program 
(Mislevy and Bock, 1982). In both of these programs, the 
aforementioned linear indeterminacy is resolved by standardizing 
the distribution of proficiency in the calibration sample in one 
way or another. The resulting item parameter estimates, and the 
scale they implicitly define, are then typically taken as fixed 
when used to estimate individual examinees' proficiencies (as may 
be required for selection or placement decisions ) or population 
characteristics such as group means (as may be required in 
educational surveys such as NAEP) . In order to focus attention on 
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the impact of the uncertainty in the linking functions, we shall 
not deal with the uncertainty in the item parameter estimates 
themselves. The interested reader is referred to Lewis (1985) and 
Tsutakawa (1986) for more on this latter topic. 

3.0 Linking Transformations 

Often, it is not feasible to administer all of the items in a 
large item pool to a single sample of examinees. Instead, 
overlapping subsets of items are administered to different samples 
of examinees. When practical considerations preclude a concurrent 
calibration of all sample data together, as may be the case when 
the various samples are collected at different points in time, 
then independent calibrations must be performed on the data 
collected from each sample. If the IRT model is true, the 
parameter estimates obtained for items common to two or more 
calibrations will differ by (i) estimation error, and (ii) an 
unknown linear transformation. 

In this paper, we address the simple case of two tests that 
share a subset of common items. Each test is independently 
calibrated on a different sample of examinees. The two 
calibration samples could represent the same group of examinees 
tested at two different points in time, or two different groups of 
examinees for which comparisons are to be made. We refer to the 
scale established by the calibration of the first sample as the 
target scale and the scale established by the calibration of the 
second sample as the provis ional scale. The inferential problems 

6 
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are, first, to estimate the linear transformation needed to bring 
the item parameter and proficiency estimates from the provisional 
scale to the target scale, and second, to account for the 
uncertainty of the linking procedure when stating the precision of 
resulting statistics. This simple case can be generalized to the 
more complex calibration problem which arises when multiple forms 
of a test are calibrated on several independent samples of 
examinees . 



3,1 The Stocking- Lord Linking Procedure 

A number of approaches have been suggested for estimating 
linking transformations. Several attempt to match characteristics 
of the distributions of a and b parameter estimates on the target 
scale and reexpressed scale (e.g., Marco, 1977), possibl)" with 
differential weighting of estimates to account for the precision 
with which they have been estimated (Linn, Levine, Hastings, and 
Wordrop, 1980) or to discount the influence of outliers (Bejar and 
Wingersky, 1981). The Stocking-Lord (1983) procedure, which we 
employ in the sequel, minimizes the average squared difference 
between test characteristic curves (TCCs) estimated from the two 
sets of item parameters available for the common items. 

The input data to the Stocking-Lord procedure consists of two 
sets of parameter estimates for the common items, one set 
expressed on the target scale and one set expressed on the 
provisional scale. For item j , we denote these estimated 
parameters as (a^ ^ ,6^ and ;^p ;^p - 2p^ respectively. 
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The goal is to estimate the parameters A and B of the linking 
transformation that can be used to produce rescaled parameter 



estimates (a.^ ,c_ ), where 

j2r’ j2r’ j2r" 




= + B, and 

J2r j2p 



A A 



‘'j2r ‘^j2p • 




unaffected by the transformation.) After A and B have been 
estimated from the items common to both calibrations, this same 
linking transformation is applied to the parameters of the items 
that appeared in the second calibration only, in order to bring 
them to the target scale. 

Estimation of A and B is accomplished by minimizing the 

squc *ed difference between estimated true scores (expected numbers 

correct) on the n common items at N preselected values of $. The 
c 

function to be minimized is 



parameter estimates expressed on the target scale, and f^CAjB,^^) 
is the true score associated with the proficiency level 
calculated from the common items using the item parameter 
estimates which were originally obtained on the provisional scale 



N 

f(A,B,^?) = 1/N S (f^(l,0,^.) - 



i=l 



2 



(3) 



where (*^(1,0,^^) is the true score associated with the proficiency 

level calculated from the common items using the item 

1 



8 




and then reexpressed on the target scale with the rescaling 
parameters A and B. That is, 



n 





and 





The values , . . . , , which are selected rather than 

estimated, play the role of the independent variables in a 
regression analysis. They should be selected to "insure that the 
equation given in (3) is minimized over the entire (expected) 
range of the target proficiency scale. 

We note in passing that under this procedure, the common 
items end up with three sets of item parameter estimates, one set 
expressed on the provisional scale, and two sets expressed on the 
target scale. Alternative procedures for combining the two sets 
of estimates expressed on the target scale are given in McKinley 
(1988). 
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3.2 A Jackknife Approximation for the Uncertainty of the 
Stocking-Lord Linking Procedure 

The uncertainty associated with the estimated rescaling 
parameters A and B of the Stocking-Lord linking procedure can be 
approximated using a Jackknife procedure (Hosteller and Tukey, 

1977) . Although alternative Jackknife implementations may be 
appropriate for the problem described here, for the purposes of 
illustration, we present a single variation only. The variation 
presented is an example of an interpenetrating Jackknife 
procedure. It consists of three steps. First, the set of n 

c 

common items used to define the transformation are divided into 
ten equal length subsets with approximately equal average 
difficulty. Second, the function given in (3) is minimized ten 
times. Each minimization is accomplished using all but one of the 
item subsets defined in step 1. Finally, the observed variation 
among the A and B parameter estimates obtained from the ten 
minimizations is used to estimate a covariance matrix which 
quantifies uncertainty due to (i) the imprecision of the estimated 
item parameters, and (ii) lack of fit from the IRT model. This 
procedure is illustrated with data from the National Assessment of 
Educational Progress in Section 5. 

The jackknife procedure described above measures variation 
arising from two sources: estimation error and model misfit. The 
uncertainty associated with estimation error can often be 
decreased by increasing the size of the calibration samples. To 
decrease tut uncertainty associated with model misfit, it is also 
necessary to have a large number of linking items. To see this, 

10 
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note that, if the IRT model were correct, the differences between 
sets of (a,b,c) estimates obtained from different increasingly 
large samples of examinees would be accounted for totally by a 
linear transformation. In this case, consistent estimates of the 
linking parameters could be obtained with as few as two linking 
items. When the IRT model does not fit, however, different sets 
of linking items will tend to provide different estimates of the 
linking parameters even as calibration sample sizes increase 
without bound. In this latter case, it is clear that the model 
misfit component of uncertainty can only be reduced by increasing 
the number of linking items. Moreover, the linking items should 
be chosen so as to be representative of the set of all items which 
might have been used to estimate the linking function. 

4. How the Uncertainty in Linking Procedures Propagates to 

Subsequent Analyses 

In this section, we show how the uncertainty associated with 
an IRT linking procedure can be accounted for, in the context of 
measuring change. As before, we consider the simple case of only 
two tests sharing a single subset of common items. The first test 
is administered to a group of examinees at time 1. The second 
test is administered to the same group of examinees at time 2. 

Our primary interest is to measure the change in proficiency 
observed over time for individual examinees and for specified 
population subgroups. We assume that a covariance matrix 
quantifying the uncertainty associated with the parameters of the 
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linear transformation used to link the two tests has been 



estimated (as with a jackknife approximation, for example). 

We first consider the problem of estimating the change in 

A 

proficiency for a single examinee. Let 0 denote a proficiency 
estimate calculated for the ith examinee at time 1 using the 
estimated item parameters which were originally obtained on the 

A 

target scale. Let ^ ^®riote a proficiency estimate calculated 
for the same examinee at time 2 using the estimated item 
parameters which were originally expressed on the provisional 

A 

scale. And finally, let cienote a proficiency estimate 

obtained for the same examinee at time 2 using the item parameters 
which were originally estimated on the provisional scale and 

A 

subsequently reexpressed on the target scale; that is, ^ i2r 

A A A 

A + B. Since 0,^ and 0,^ are both expressed on the target 

i2p il i2r ^ ^ 

scale, an estimate of the change in proficiency for this examinee 

A A A 

can be obtained from the difference, D. « If the 

1 i2r il 

parameters of the linking transformation were known without error, 
then the standard error of this estimated change would be given by 



2 2 1/2 

SE(D.) = SE(^_ 

1 i2r il i2r il 



( 4 ) 



where ^2x; standard errors of the proficiency 

A A 

estimates and 0.^, respectively. (As is usually the case, we 

have also assumed independent errors across tests.) 

Now a will be a function of the item parameters which were 
originally estimated on the target scale, whereas ^ 

function of the item parameters which were originally estimated on 
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the provisional scale and then reexpressed on the target scale. 
Thus, any procedure which accounts for the uncertainty of the 
transformation used to link the two tests will affect the 



calculation of ^il' calculate that 

A A A A ^ 

^ A +B, and that the estimated standard error of $ ^ , 

i2r i2p i^P 

denoted , can be calculated as a function of item parameters 

i2p 

which have not yet been rescaled and are thus unaffected by the 
uncertainty of the linking procedure. 

A A A 

As a first step, define a covariance matrix for 
as follows : 



E = 



i2p 



0 

0 



0 

2 

^ A 
^AB 



AB 

2 

^ B 



where a . and quantify estimation variation for the 

A B AB 

parameters A and B of the linking transformation. The quantities 

0 , 0 , and o can be approximated using the jackknife procedure 
A B AB 

given in the previous section. Second, note that 



Var(*i2^) 



Var(A^._ + B) 

i2p 



Var(g(^^2p>^’^)) 





dl^ , dizl 


. Slzl 


s 


am . . 


, am 




89.^ 3A 

i2p 


3B 




89,^ 8A 

i2p 


3B 



[ A , . 1 ] E [ A . . 1 ] 
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i 



( 5 ) 




+ a 



A 



2 



B 






Thus, the uncertainty associated with the linking procedure can be 

accounted for in the estimated standard error of the difference 

D. , as follows : 

1 



The same procedure can also be used to incorporate the 
uncertainty associated with the linking parameters A and B in the 
estimated standard error of aggregate statistics such as the 
difference between two subgroup means. In this latter case, the B 
and o statistics for individuals will be replaced by corresponding 
point estimates and standard errors for subgroup means. 

5. A Numerical Illustration 

In this section, data available from the National Assessment 
of Educational Progress (NAEP) , a congressionally mandated survey 
of the educational achievement of American students, is used to 
approximate the uncertainty of the Stocking-Lord linking procedure 
and to evaluate the consequences of that uncertainty. Data from 
two NAEP surveys are used: the 1984 Reading Survey and the 1986 





^ ^ 2 1/2 



( 6 ) 



where f(^.^ ,A,S) is given as in (5). 

L/p 
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Readir Survey. Both of these surveys were independently scaled 
using a three parameter logistic IRT model. Item parameters were 
estimated using BILOG (Mislevy & Bock, 1982) and mean 
proficiencies for population subgroups were obtained using the 
plausible values methodology given in Mislevy and Sheehan (1987). 
These data are used to illustrate the consequences of the 
uncertainty of the transformation parameter estimates from the 
Stocking-Lord linking procedure. Because NAEP data support 
inferences about aggregate statistics such as group means but not 
about individuals' proficiencies, we use real NAEP data to 
demonstrate procedures for changes in group means but simulated 
data for changes in individual proficiencies. 

5.1 The NAEP Data 

Mean reading proficiencies for the three age groups which 
were assessed by NAEP in 1984 and 1986 are given in Table 1. The 
first row of the table provides 1984 age group means expressed on 
the 1984 calibration scale. For the purpose of this illustration, 
the 1984 calibration scale is designated as the target scale. The 
second and third rows of the table provide 1986 age group means 
expressed on the provisional scale (the 1986 calibration scale) 
and the target scale (the 1984 calibration scale) . The Stocking- 
Lord linking procedure was used to estimate the linear 
transformation needed to express the 1986 means on the 1984 
calibration scale. The table also provides estimated standard 
errors for each mean. 
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Table 1 about here 



5.2 Quantifying the Uncertainty of the NAEP Link 

The 1984 NAEP survey contained 128 cognitive reading items. 
The 1986 NAEP survey contained 107 cognitive reading items, 76 
which were common to the 1984 assessment and 31 which were 
administered for the first time in 1986. The linking 
transformation needed to express the item parameters obtained from 
the calibration of the 1986 data on the scale established by the 
calibration of the 1984 data was estimated using the Stocking-Lord 
linking procedure, as implemented in the TBLT computer program 
(Stocking, 1986). The generally satisfactory results can be seen 
in Figure 1, which shows the TCCs of the first and second 
calibrations of the common items after reexpression, and in Figure 
2, which plots the b-parameter estimates from the first and 
reexpressed second calibrations. The jackknife procedure 
described in Section 3 was used to approximate the uncertainty 
associated with the estimated paramete-'s of the linking 
transformation. The results are given in Table 2. 



Figures 1 and 2 and Table 2 about here 
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5.3 Inference for a Single Examinee 

The artificial data set constructed for this analysis 
contained simulated responses for five examinees to two tests. 

The first test consisted of 30 items selected from the 1984 NAEP 
reading survey . The second test consisted of 30 items selected 
from the 1986 NAEP reading survey, half of which were common to 
the 1984 survey. For a given examinee, responses were generated 
in accordance with the 3PL, with item parameter estimates for the 
first test taken from the 1984 NAEP calibration run and item 
parameter estimates for the second test taken from the 1986 NAEP 
calibration run. So that the proficiency of a given simulee was 
the same on both tests, a value of $ was specified for the first 
test and (^-B-)/A was used for the second. Simulees' $ values on 
the first test were -1.0, -0.5, 0.0, 0.5, and 1.0. The response 
vectors generated according to these specifications are given in 
Table 3. 



Table 3 about here 



Treating the item parameter estimates as known, maximum 
likelihood estimates (MLEs) of 0 and associated standard errors 
were obtained for each response pattern using the BILOG program. 
They are shown in Table 4, with the values for the second test 
shown before and after reexpression. Table 5 provides estimated 
standard errors for the change from the first test to the second 
using (4) , which does not take the uncertainty of A and B into 

2il 



account, and (6), which does. The increase in standard errors is 



negligible, about 2 -percent on the average. An approximate 
variance components analysis is given in Table 6. For each 
response pattern considered , the total error variance is estimated 
using (6) which includes components due to both sampling and 
linking. The contribution due to sampling alone is estimated 
using (4) and the contribution due to linking is obtained by 
subtraction. The table shows that for each response pattern 
considered, the relative increase in uncertainty is negligible, 
accounting for about three percent of the total error variance on 
the average. 



Tables 4, 5 and 6 about here 



5.4 Inference for Group Means 

The changes in the mean reading proficiencies of students 
aged 9, 13 and 17, over the two year period from 1984 to 1986, as 
estimated from the NAEP data, are given in Table 7.^ The table 
also provides approximate standard errors calculated using (4) and 
(6) . Whereas the size of standard errors increased by only about 
2 -percent for estimates of change of individuals . the increase in 
standard errors for groups is about 200-percent! An approximate 



These figures are shown for illustrative purposes only, and 
are not to be taken as estimates of changes in reading proficiency 
during the period due to certain anomalies in the 1985/86 NAEP 
data. The interested reader is referred to Beaton (1988) for 
further information. 
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variance components analysis is given in Table 8. The table shows 
that the component due to linking represents approximately 90- 
percent of the total error variance, on the average. To put these 
results in another perspective, the change in mean reading 
proficiency at each age level is expressed in standard error units 
in Table 9. The table shows, for example, that the decrease in 
the mean reading proficiency of 9 year olds is approximately three 
standard errors when the uncertainty of the linking procedure is 
not accounted for, but only one standard error when it is. 

Tables 7, 8 and 9 about here 



6.0 Summary 

A common problem in applied work with item response theory is 
to express item parameter estimates from separate calibrations on 
the same scale, based on the multiple estimates for subsets of 
items common to two or more calibrations. Several methods have 
been proposed for estimating the optimal linear transformations 
for this purpose, including the Stocking-Lord (1983) procedure for 
matching test characteristic curves. After the resulting 
transformations have been applied, the uncertainty associated with 
them is rarely taken into account in subsequent analyses of 
individual or group levels of proficiency. 

This uncertainty can be expressed in terms of a covariance 
matrix of estimation errors, which can be approximated empirically 
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through a procedure such as the jackknife. With an approximation 
of the sampling covariance matrix of estimation errors of the 
parameters of a linking transformation, one can readily derive 
standard errors for change scores or comparisons that take this 
additional uncertainty into account. 

Using data from the 1984 and 1986 reading surveys of the 
National Assessment of Educational Progress, this paper used the 
jackknife to approximate the uncertainty of the linking 
transformation between the two assessments. Its effect was found 
to be negligible in the context of drawing ini^.rences about change 
of individuals, since its magnitude was much smaller than the 
uncertainty arising from having only the limited numbers of item 
responses from individuals that generally characterize individual 
testing programs. Correct standard errors were only about 2- 
percent larger than those that ignored linking uncertainty. The 
effect was substantial in the context of estimating group changes, 
however, leading to correct standard errors that were 200-percent 
larger. The differential impact is due to the fact that sampling 
variances of group means are much smaller than sampling variances 
of individual scores, while the sampling variance of the linking 
transformation is the same in both cases. 
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Table 1 



Mean Proficiencies 

Estimated from the 1984 and 1986 NAEP Reading Surveys 
With Standard Errors in Parentheses 



Year 


Scale 


Aee 9 


Age 13 


Age,. 17 


84 


84 Calib. 


-0.752( .020) 


0.150( .014) 


0.766C.018) 


86 


86 Calib. 


-0.375(.025) 


0.571( .019) 


0.874(.018) 


86 


84 Calib. 


-0.864(.028) 


0.198( .022) 


0.538( .020) 



The 1984 sample included over 22,000 students at each age 
level. The 1986 sample included approximately 7,000 Age 9 
students, 6,000 Age 13 students, and 16,000 Age 17 students. 
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Table 2 



Results of the Jackknife Approximation 



for the 


Stocking- Lord Linking 


Procedure 


Run^ 


I terns 


A 


B 


0 


76 


1.122196 


-0.442910 


1 


68 


1.118018 


-0.449670 


2 


68 


1.126296 


-0.447837 


3 


68 


1.121856 


-0.449472 


4 


68 


1.110982 


-0.433893 


5 


68 


1.114703 


-0.426793 


6 


68 


1.128065 


-0.430320 


7 


69 


1.125834 


-0.446748 


8 


69 


1.128753 


-0.440663 


9 


69 


1.112862 


-0.447648 


10 


69 


1.135424 


-0.455858 





Jackknife 


Parameter 


Estimate 


2 

A 


0.00512 


2 

B 


0.00740 


‘^AB 


-0.00238 



The parameter estimates, A and B, obtained from Run 0 were us 
to reexpress the 1986 results on the 1984 scale. The parameter 
estimates obtained from Runs 1 through 10 were used only to 
estimate the uncertainty of the linking procedure. 







Table 3 



Simulated Responses To Test 1 
Administered at Time 1 



Generating 



Value 



-1.0 


11000 


11000 


10011 


00101 


00000 


01010 


-0.5 


00110 


10101 


10000 


10011 


00111 


11001 


0.0 


00010 


11101 


11100 


00100 


OHIO 


11100 


0.5 


11111 


01111 


11111 


00111 


01101 


11111 


1.0 


11111 


11111 


11111 


01111 


10110 


11111 



Simulated Responses To Test 2 
Administered at Time 2 

Generating 



Value 



- .50 


00010 


01000 


00011 


11000 


10000 


00001 


- .05 


11001 


01000 


01011 


11101 


01100 


11000 


0.39 


01100 


01101 


10011 


00111 


11111 


10100 


0.84 


00011 


11111 


10111 


11110 


11101 


01111 


1.29 


11111 


11111 


11111 


10111 


10110 


OHIO 



Table 4 



Maximum Likelihood Estimates of Reading Proficiency 
At Time 1 and Time 2 
For Five Simulated Subjects 
With Estimated Standard Errors in Parentheses 





Value 


Value Estimated 


at Time 2 


Generating 


Estimated 


Before 


After 


Value 


at Time 1 


Reexpression 


Reexpression 


-1.0 


-1.062 (.625) 


-0.375 (.422) 


-0.864 (.474) 


-0.5 


-0.662 (.489) 


-0.116 (.534) 


-0.574 (.560) 


0.0 


-0.502 (.470) 


0.249 (.360) 


-0.163 (.404) 


0.5 


0.748 (.546) 


0.824 (.409) 


0.482 (.459) 


1.0 


1.177 (.662) 


1.434 (.512) 


1.172 (.574) 
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Table 5 



An Estimate of the Change in Reading Proficiency 
From Time 1 to Time 2 
For Five Simulated Subjects ^ 

With Approximate Standard Errors 



Change in 
Generating 


Estimated 


S.E. 


S.E. 


Values 


Change 


Method 1 


Method 


0 


0.198 


0.784 


0.790 


0 


0.088 


0.743 


0.779 


0 


0.339 


0.620 


0.625 


0 


-0.266 


0.713 


0.718 


0 


-0.005 


0.876 


0.883 



Method 1 refers to the method which assumes that the 
linking function is known without error, as in equation (4); 

Method 2 refers to the method which accounts for the 
uncertainty of the linking procedure as in equation (6) . 



Table 6 

A Comparison of Approximate Variance Components 
For Inferences About Change at the Individual Level 



Generating 

Value 

- 1.0 

-0.5 

0.0 

0.5 

1.0 



Total 

Variance 

.6241 

.6068 

.3906 

.5155 

.7797 



Component 
Due to 
Sampling 
.6146 
.5520 
.3844 
.5084 
.7674 



Component 
Due to 
Linking 
.0094 
.0548 
.0062 
.0071 
.0123 



Linking 
Variance 
as % of 
Total 
Variance 

1.5 
9.0 

1.6 
1.4 
1.6 





Table 7 



An Estimate of the Change in Mean Reading Proficiency 

From 1984 to 1986 ^ 

With Approximate Standard Errors 





Estimated 


S.E. 


S.E. 


Ape 


Change 


Method 1 


Method 2 


9 


-0.112 


.034 


.105 


13 


0.048 


.026 


.084 


17 


-0.228 


.027 


.066 



Method 1 refers to the method which assumes that the 
linking function is known without error, as in equation (4); 
Method 2 refers to the method which accounts for the 
uncertainty of the linking procedure as in equation (6). 



Table 8 

A Comparison of Approximate Variance Components 
For Inferences About Change at the Group Level 



Linking 

Variance 







Component 


Component 


as % of 




Total ^ 


due to 


due to 


Total 


Ape 


Variance 


Sampling 


Linking 


Variance 


9 


.0110 


.0012 


.0098 


89.5 


13 


.0071 


.0007 


.0064 


90.1 


17 


.0044 


.0007 


.0037 


84.1 



1 Total Variance refers to the estimated variance of the change 
in mean reading proficiency from 1984 to 1986. 



Table 9 



The Estimated Change in Mean Reading Proficiency 
from 1984 to 1986 
Expressed in Standard Error Units 





Method 1 


Method 2 


Ape 


S.E. Units 


S.E. Units 


9 


-3.29 


-1.07 


13 


1.85 


0.57 


17 


-8.44 


-3.45 
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Figure 1 

Comparison of Test Characteristic Curves 



Solid Line - 1984 Curve 
Dashed Line « Reexpressed 1986 Curve 
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Figure 2 

Comparison of Item b Parameter Estimates 
Reexpressed 1986 Estimates vs. 1984 Estimates 
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